LexSubNC: A Dataset of Lexical Substitution for Nominal Compounds

نویسندگان

Rodrigo Wilkens

Leonardo Zilio

Silvio Cordeiro

Felipe Paula

Carlos Ramisch

Marco Idiart

Aline Villavicencio

چکیده

In the context of NLP tasks such as text simplification, lexicons containing information about semantically related words are an important resource for evaluating the quality of the system output. Existing resources containing lexical substitutes have been built with a focus on single words. In this paper, we present a lexical substitution dataset for Portuguese nominal compounds. The compounds have varying degrees of compositionality, conventionality and frequency, and we investigate the impact of these characteristics on the suggestions of lexical substitution made by native speakers. No strong correlations are found for these factors on the number or type of responses provided. However, a significant effect of compositionality is found in the use of one of the component words (head or modifier) as a substitute. The resulting resource, LexSubNC, contains over 1,500 manually validated substitutes for 180 compounds, further classified according to the type of response.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

How Naked is the Naked Truth? A Multilingual Lexicon of Nominal Compound Compositionality

We introduce a new multilingual resource containing judgments about nominal compound compositionality in English, French and Portuguese. It covers 3 × 180 noun-noun and adjective-noun compounds for which we provide numerical compositionality scores for the head word, for the modifier and for the compound as a whole, along with possible paraphrases. This resource was constructed by native speake...

متن کامل

Lexical Substitution Dataset for German

This article describes a lexical substitution dataset for German. The whole dataset contains 2,040 sentences from the German Wikipedia, with one target word in each sentence. There are 51 target nouns, 51 adjectives, and 51 verbs randomly selected from 3 frequency groups based on the lemma frequency list of the German WaCKy corpus. 200 sentences have been annotated by 4 professional annotators ...

متن کامل

A Preliminary Study of Croatian Lexical Substitution

Lexical substitution is a task of determining a meaning-preserving replacement for a word in context. We report on a preliminary study of this task for the Croatian language on a small-scale lexical sample dataset, manually annotated using three different annotation schemes. We compare the annotations, analyze the inter-annotator agreement, and observe a number of interesting language-specific ...

متن کامل

A Dataset for the Evaluation of Lexical Simplification

Lexical Simplification is the task of replacing individual words of a text with words that are easier to understand, so that the text as a whole becomes easier to comprehend, e.g. by people with learning disabilities or by children who learn to read. Although this seems like a straightforward task, evaluating algorithms for this task is not so. The problem is how to build a dataset that provide...

متن کامل

English Nominal Compound Detection with Wikipedia-Based Methods

Nominal compounds (NCs) are lexical units that consist of two or more elements that exist on their own, function as a noun and have a special added meaning. Here, we present the results of our experiments on how the growth of Wikipedia added to the performance of our dictionary labeling methods to detecting NCs. We also investigated how the size of an automatically generated silver standard cor...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

LexSubNC: A Dataset of Lexical Substitution for Nominal Compounds

نویسندگان

چکیده

منابع مشابه

How Naked is the Naked Truth? A Multilingual Lexicon of Nominal Compound Compositionality

Lexical Substitution Dataset for German

A Preliminary Study of Croatian Lexical Substitution

A Dataset for the Evaluation of Lexical Simplification

English Nominal Compound Detection with Wikipedia-Based Methods

عنوان ژورنال:

اشتراک گذاری